Introduction: Artificial intelligence (AI) is increasingly being incorporated into the healthcare space as a tool to support clinical decision making and enhance patient care. Professional societies including ASH, as well as private companies such as Primum, Inc, offer free consultations with appropriate experts who provide personalized responses to clinician-submitted hematology cases. However, the potential of AI tools to supplement or supplant expert consultations remains uncertain. We report results of a study comparing AI versus expert physician responses to 107 real-world hematology cases.

Methods: Among 107 cases, inquiries included lymphomas (30), myeloma (24), leukemias (11), myeloid disorders (10), as well as classical hematology cases (32), assessed among 20 unique experts. Responses to these de-identified cases submitted by practicing clinicians to the Primum platform (www.primum.co/) between June 2022-July 2023 were compared to GPT-4 responses (openai.com/chatgpt/). The instructional prompt to GPT-4 was, “You are an expert oncologist conversing with another oncologist as a peer. You prefer to rely on guidelines and data published in reputable medical journals when responding.” Five expert faculty at our institution adjudicated the blinded comparative responses, including their preference, quality and practical value scores, and prediction of which response was AI generated. Comparison of scores was by t-test to generate P-values between expert and AI groups, and Pearson correlation was used for comparisons between adjudication scores.

Results: Expert responses were preferred by >50% of adjudicators in 75% of cases (deviation ±25%). Randomized AI responses were correctly identified 90% of the time. Mean expert vs AI scores (Likert scale 0-4) for quality (2.0 vs 2.1, P=0.9) and practical value (2.1 vs 2.1, P=0.9) were equivalent. Interestingly, AI responses were preferred in 46% (n=15) of classical hematology and 31% (n=9) of lymphoma cases, largely due to being more concise. However there was no concordance between high practical value scores and disease subtype for either group.

Conclusions: Expert physician responses were preferred for most of the cases, suggesting an implicit value of personalized responses compared to AI. Results showed no significant differences in quality or practical utility between AI generated responses and those from experts, reflected a similarity in the information extracted from standardized guidelines. Our findings may be limited by the broad coverage of hematologic conditions for which experts and guidelines vary. Overall, these data suggest that while AI can supplement knowledge of management paradigms by providing basic management strategies, at present it cannot replace expert clinical consultation in clinical practice.

Disclosures

Struck:Primum, Inc: Current Employment. Saint Fleur-Lominy:AstraZeneca: Consultancy, Other: consultation. Braunstein:CTI Biopharma: Consultancy, Honoraria; Cardinal Health: Consultancy, Honoraria; Bristol Myers Squibb: Consultancy, Honoraria; Pfizer: Consultancy, Honoraria; Lava Therapeutics: Consultancy, Honoraria; Seagen: Consultancy, Honoraria; Guidepoint Global: Consultancy, Honoraria; AstraZeneca: Consultancy, Honoraria; Janssen: Consultancy, Honoraria, Research Funding, Speakers Bureau; Epizyme: Consultancy, Honoraria; Abbvie: Consultancy, Honoraria; Sanofi: Consultancy, Honoraria.

This content is only available as a PDF.
Sign in via your Institution